PC Week Labs
PC Week Labs Server Database Benchmark
Part I: Description
Ver. 2.7
January 2000
Timothy
Dyck
Senior Analyst, PC Week Labs
519-746-4241
timothy_dyck@zd.com
Table of Contents
1. Document Purpose... 1
2. Acknowledgements 1
3. Business Model........... 2
4. Benchmark Test Summary..... 2
4.1. Scalability
Definition 2
4.2. Scalability
Tests 3
4.2.1. OLTP Single Read Test 3
4.2.2. OLTP Read/Write Mix Test 3
4.2.3. Mixed Workload Test 3
4.3. Availability
Definition 3
4.4. Availability
Tests 4
4.4.1. Dedicated Reporting Test 4
4.4.2. Online Reporting Test 4
4.4.3. Online Backup Test 4
4.5. Other
Tests... 5
4.5.1. Load and Index Test 5
4.5.2. Update Statistics Test 5
5. Benchmark Schedule 5
6. How Test Results Will Be Reported and Used...... 6
7. Full Disclosure 6
8. Sizing of the Benchmark 7
9. Benchmark Testbed. 7
9.1. Server
Hardware Configuration 7
9.2. Server
Software Configuration 8
9.3. Client
Hardware Configuration 9
9.4. Client
Software Configuration 9
9.5. Network
Configuration 9
9.6. Database
Server Configuration 10
10. Benchmark Policies..... 11
10.1. Test
Run Times 11
10.2. Think
Times... 12
10.3. Test
User Loads 12
10.4. Random
Number Generation 12
10.5. Checkpoints.. 12
10.6. Database
Warm-up and Ensuring Steady State 12
10.7. Cache
Management 13
10.8. Synchronization
Points. 13
10.9. Transaction
Isolation Level 13
10.10. Results
Verification 13
10.11. All
Computations Must Be Done In Real-Time 13
10.12. Query
Code Design and Allowable Modifications 14
10.13. Captured
Metrics 14
10.14. Measurement
Policy 14
10.15. Product
Configuration Submitted for Testing 15
Table of Tables
Table
3: Test User Loads 11
This document is Part I of the benchmark specification. It describes the test methodology, characteristics and rules for how PC Week Labs tests database servers.
Part II of the benchmark specification contains a complete benchmark implementation, including scripts for database tuning, creation and data load, plus the OLTP and DSS queries that form the benchmark. Vendors that agree to participate in the benchmark will receive this code two (2) weeks before they arrive at PC Week Labs for on-site testing.
If you are unable to
comply with any of the policies outlined in this document, or have any
questions or other comments, please send feedback as soon as possible to:
Timothy
Dyck
Senior Analyst, PC Week Labs
timothy_dyck@zd.com
519-746-4241
These benchmark tests were originally derived from the AS3AP (ANSI SQL Standard Scalable and Portable) Benchmark for Relational Database Systems developed at Cornell University and the University of Illinois at Chicago.
The 1989 paper by Carolyn Turbyfill, Cyril Orji and Dina Bitton describing the benchmark can be found in The Benchmark Handbook for Database and Transaction Processing Systems, 2nd Edition, ed. Jim Gray, published by Morgan Kaufmann Publishers Inc.
Dina Bitton, Jeff Millman, Cyril Orji and Carolyn Turbyfill at Cornell University and at the University of Illinois at Chicago developed the data generator we are using in this benchmark, as3apgen.exe. We thank Dr. Bitton (now at IDS Integrated Data Systems Corp, www.ids-corp.com) for allowing us to distribute the data generator with our benchmark.
Brian Butler made many modifications to the base AS3AP query set to produce ver. 1.0 and 1.1 of this benchmark (the version used in our last comparative roundup published in October 1994).
Brian Butler later developed a benchmarking tool (Benchmark Factory 97) that we are using to run this benchmark. Benchmark Factory 97 is commercially available from Mr. Butler’s company, Client/Server Solutions Inc. (more information and trial versions of Benchmark Factory can be found at Client/Server Solutions’ web site, www.csrad.com).
This benchmark is modeled after the database needs of a small-to-medium-sized company. This company has 300 employees who have access to the database, most of whom use it for data entry and query, but a few also use it for data analysis, forecasting and production reporting.
The company has an IT staff of three people, one of whom allocates about half her time to administer the database system. The company has spent around $15,000 to purchase a single moderately powerful (but dedicated) database server to handle the load.
These characteristics shape the rest of the benchmark.
The preamble to the AS3AP paper describes that benchmark as “a comprehensive but tractable set of tests for database processing power.” AS3AP sets itself apart from other industry benchmarks (like the TPC-C) by including tests for load, export and index operations in addition to a broad OLTP and DSS workload.
While we have modified AS3AP in various ways to accommodate changing industry conditions, its original goals of reasonable comprehensiveness combined with tractability (ease of operation and scalability) remain our goals.
The benchmark is designed to measure two broad characteristics of a database server: scalability and availability.
In our case, we’re defining scalability as the database server’s ability to provide minimal and predictable degradation in per-user response times as user loads increase. To measure this, we will be holding server hardware resources and database size constant while varying concurrent user loads under a set of different query mixes.
Scalability will be measured in both absolute transactions per second and in various query response time metrics. The goal is to maximize throughput and minimize query response time.
We will be running three different scalability tests, using the following three workloads.
OLTP Single Read Test measures how many single-row reads accessing a single table can be completed within the measurement window. This test measures absolute database performance on a single, simple query.
This is a multi-user test with a high number of users. The metrics are transactions per second (higher is better) and client response time (lower is better).
OLTP Read/Write Mix Test measures how many OLTP-type reads and writes accessing multiple tables can be completed within the measurement window. The insert, updates, deletes and selects range from very simple to moderately complex. This test measures database performance in a production OLTP environment.
This is a multi-user test with a high number of users. The metrics are transactions per second (higher is better) and client response time (lower is better).
Mixed Workload Test measures how many (read/write) OLTP- and (read-only) DSS-type transactions accessing multiple tables can be completed within the measurement window. The mix is heavily weighted towards OLTP transactions, but all transactions need to be treated equally by the database (no preferential treatment is allowed for OLTP transactions). This test measures database performance in a production OLTP environment when ad hoc analysis queries are also included.
This is a multi-user test with a high number of users. The metrics are transactions per second (higher is better) and client response time (lower is better).
We’re defining availability is the ability of the database server to either finish batch tasks in the shortest time possible or provide minimal and predictable degradation in per-user response times (using the OLTP Read/Write Mix Test) while batch tasks are concurrently running. The two batch jobs we will be running are a DSS report-creation test run in both dedicated and non-dedicated (i.e. online) configurations, and an online backup of the full database while the database is under load.
Availability will be measured by either total time or by transactions per second and various query response time metrics. The goal is to minimize completion time, or maximize OLTP throughput and minimize query response time (the primary metric) while still completing the concurrent batch jobs in a reasonable amount of time (the secondary metric). We will also be observing CPU utilization during the online reporting and backup tests to see how CPU use changes from the stand-alone OLTP Read/Write Mix Test.
We will be running three different availability tests, using the following three workloads.
Dedicated Reporting Test measures how fast the database can process a given set of DSS-type queries submitted in a fixed order when it is doing no other work. The queries range from simple to very demanding. This test measures database performance in a dedicated reporting environment.
This is a single-user test. The metric is total completion time and faster is better.
Online Reporting Test measures throughput and response time of the OLTP Read/Write Mix Test at a fixed user load while the database is also running the workload from the Dedicated Reporting Test above.
This is a multi-user test with a moderate number of users. The metrics are OLTP transactions per second (higher is better) and OLTP client response time (lower is better). We will also be measuring completion time for the report generation run, but this metric isn’t as important to us in this test as maintaining OLTP performance.
Online Backup Test measures throughput and response time of the OLTP Read/Write Mix Test at a fixed user load while the database also performs a full database backup (data, indices and transaction log). We will be using a single DLT tape drive as a backup target.
This is a multi-user test with a moderate number of users. The metrics are OLTP transactions per second (higher is better) and OLTP client response time (lower is better). We will also be measuring completion time for the database backup job, but this metric isn’t as important to us in this test as maintaining OLTP performance.
As part of the benchmark, we will need to perform a number of administrative tasks to set up the database. To simplify number of optimization points in the benchmark, we will do these tasks ourselves before vendors arrive and don’t plan on reporting results for these tests. However, we will use the same methodology for each database and will record private results. If any particular database is dramatically better than the others (or dramatically worse)—in our case, this means more than a 50% difference in results using non-partitioned input files and a single instance of the data loader—we’ll put that into the text of the story.
Load and Index Test measures how long it takes for the database to load, index and update optimizer statistics for locally stored test data set stored as ASCII flat files. Vendors are can load, index and update statistics separately or can combine these operations if their database provides this option. The metric is total time and faster is better.
Update Statistics Test measures how long it takes to update optimizer statistics for the updates and fourmill tables (using a full table scan—no sampling) after the OLTP Read/Write Mix Test has been run. The metric is total time and faster is better.
Vendors that agree to participate in this benchmark should notify PC Week Labs by e-mail. We will then arrange a mutually agreeable on-site testing date for the benchmark. The available dates are April 19, April 26, May 3 and May 10. (April 12 is a possibility too, we aren’t sure about that one yet.) Please provide two possible weeks where your technical representatives would come for on-site testing so we can deal with potential scheduling conflicts.
Two weeks before their finalized testing date, participating vendors will receive via e-mail the following components they will need to prepare for the benchmark:
¨ Part II of the benchmark specification (containing the a full benchmark implementation customized to the vendor’s particular database)
¨ Data generator to generate a test data set
Vendors then have one week to examine the implementation and make any configuration changes they wish to optimize the implementation for their product and for our testbed.
Any
changes need to be sent back to Timothy Dyck, who will decide if the changes
will be allowed.
We will be tightly controlling allowable script modifications to ensure that any changes act solely to optimize the workload for a particular database’s syntax or feature set, not change the benchmark workload itself.
We will spend one week checking the code to ensure it meets all the rules required in the rest of this document and staging the benchmark hardware, including installing the database, creating devices, loading and indexing the data, and installing vendor-specific client-side code onto each physical client.
For on-site testing, we’re asked each vendor to send technical staff to PC Week’s New York City lab to conduct the benchmark. At that point, we will start recording performance data.
Vendors will have a maximum of one week on-site to complete one or more runs of the entire benchmark.
By participating in this benchmark, vendors are agreeing to allow Ziff-Davis to publish this data in any of their publications using any publication medium. Participating vendors should supply a letter or e-mail providing Ziff-Davis Inc. with written permission to publish such information before receiving part II of the benchmark specification (as per server database license agreement restrictions).
Vendors may not publish results of this benchmark (such as in advertising) other than to quote portions of our review or distribute reprints. (That is, cherry-picking individual results out of published results and presenting them in isolation is not allowed.)
Vendors must document all changes made to the tested configuration from standard installation defaults. We will be publishing selected portions of this information in our print review and will make full configuration information, query listings and results available to other vendors once the story has been published.
The benchmark data set is composed of about 38 million rows of data in twelve tables, with rows an average of 100 bytes wide. Thus, the entire data set takes up about 3.8GB of storage without indexing.
All vendors will use the same data set for the actual test.
We will be testing user loads up to
300 simultaneous connections (with no think times).
This section describes the operating environment for the test. All non-database components will be tightly controlled to ensure consistency and fairness. Other than specifically allowed, vendors may not change any components or settings.
To ensure all vendors have exactly the same test bed, we will wipe disks and restore all machines to the described initial base configuration between vendors. All vendors will use the same (identically configured) client and server hardware for all testing.
The database server hardware we’ll be using will be a Hewlett-Packard NetServer LH 4 with:
¨ 2 x 500MHz Pentium II Xeon CPUs with 1 MB L2 cache each
¨ 512MB of RAM
¨ 1 SCSI internal controller
¨ 1 SCSI RAID Array controller
¨ 3 x 9.1 GB Fast Wide 7200 rpm SCSI drives (Seagate drives) connected to the SCSI internal controller above (the first disk will hold the OS, OS swap file and database executables; the second disk will hold the data import files; the third disk will hold the database transaction log
¨ 6 x 9.1GB Fast Wide 7200 rpm SCSI drives (Seagate drives) connected to the SCSI controller above (the disks will be grouped into a single RAID-5 stripe set which will hold all database data devices)
¨ 2 10/100 HP D50BA Ethernet cards (Intel PRO/100 Server card)
¨ 1 CD-ROM drive (to be used for loading software and test data)
¨ 1 external DLT tape drive for database backups (one tape head)
We will tune the server’s system and controller settings once before any vendors arrive and will then lock down the hardware configuration. Vendors may not modify any BIOS or other machine hardware parameters.
The server operating system will be Windows NT Server 4.0. The system will be a standard install of Windows NT Server in standalone server mode. IIS will not be installed.
We will install NT Service Pack 5 on the machine.
The server will be running one network protocol, TCP/IP, and one network requester, Microsoft Windows Networking. See network configuration for more details.
The server will be configured not to provide any processing boost to foreground applications, NT’s “maximize throughput for network applications” option will be set and the page file will be set to 1GB. The paging file will be on the server’s boot disk.
Vendors may add users, services, add or update operating system files, or add or update operating system variables but only by using standard installation routines or as advised by the database’s normal documented installation procedure. Operating environment modifications not documented in shipped CD or paper manuals will not be allowed.
Any other operating system configuration changes or modification will not be allowed. These include patching OS or database executable files, turning off normally running services, modifying network stacks, binding network card interrupts to specific CPUs and so on.
In particular, exactly the following non-database services must be running on the server system:
¨ Alerter, Computer Browser, Event Log, License Logging Service, Messenger, Net Logon, NT LM Security Support Provider, Plug and Play, Protected Storage, Remote Procedure Call (RPC) Service, Server, TCP/IP NetBIOS Helper, Workstation
In summary, OS tuning procedures not automatically performed by the database installation routine or documented in database installation instructions such that all users would make the modification won’t be allowed.
We will be re-imaging the OS drive from a pre-setup master and reformatting all other drives between vendors.
We have 18 physical client systems, a mix of Pentium and Pentium II workstations. They are each configured with:
¨ At least a 166Mhz Pentium CPU
¨ 64MB of RAM
¨ 1 3Com 3c905b 10/100 Ethernet card
¨ At least a 1.6 GB IDE disk (to be used for the operating system, OS swap space and database client code)
Client BIOS settings will be left at default values. Vendors may not modify any BIOS or other machine hardware parameters.
The client operating system will be Windows NT Workstation 4.0. The operating system will be installed using all default settings and we will install NT Service Pack 4 on all machines.
The clients will be running one network protocol, TCP/IP, and one network requester, Microsoft Windows Networking. See network configuration for more details.
We will install database client software onto each client using vendor-supplied recommended installation guidelines and verify connectivity to the server. Vendors can choose to use a native communication protocol or they can their ODBC driver if they include one with their client software. No other client configuration changes are permitted.
The network will be a single isolated Ethernet containing the database server, database benchmark controller and clients. The clients will be segmented into two networks (and two subnets) of 9 systems each and both the database server and controller will have two network cards connecting them to each group of clients.
All machines have 100MB Ethernet adapters and are connected together using 100MB Ethernet switches, offering 100MB throughput throughout the network. All devices are configured to run in half-duplex mode.
TCP/IP will be the network protocol used by all systems. The benchmark controller will be running a WINS server for NetBIOS to IP name resolution. Vendors can use Named Pipes or pure TCP/IP as their base communication protocol.
Vendors cannot otherwise change any network settings.
Vendors may make any database configuration changes (for example, cache sizes, number of threads/processes, optimizer settings and any other engine configuration changes) wished to ensure optimal performance. Vendors may start the database engine using any command line parameters desired.
The transaction log disk and data device RAID volume can be formatted using either NTFS or used raw, at the vendor’s option.
All tuning decisions as well as all engine and command line parameters used must be described in the product’s documentation and recorded for purposes of full disclosure. As described above, these parameters need to be vetted by PC Week to ensure fairness.
In addition, we’ll be using a single database configuration and disk layout for all the tests. The database can’t be reconfigured or re-tuned between tests. This is a typical requirement for the mid-size multi-purpose database market we are modeling.
Here are the user loads per test:
|
Test Name |
Number of Users |
|
OLTP Single Read Test |
1, 25, 50, 75, 100, 200, 300 (7 iterations) |
|
OLTP Read/Write Mix Test |
1, 25, 50, 75, 100, 200, 300 (7 iterations) |
|
Mixed Workload Test |
1, 25, 50, 75, 100, 200, 300 (7 iterations) |
|
Online Reporting Test |
300 |
|
Online Backup Test |
300 |
Table 1: Test User Loads
Both the benchmark control logic and transaction code depend on random number sequences for their operation. The random number generator is seeded with a unique value at the start of each test run (the seed is the numeric ID of each virtual database client).
Benchmark Factory uses a public 32-bit algorithm published in Dr. Dobb’s Journal a few years ago, and the code is included in the Benchmark Factory manual. Earlier versions of Benchmark Factory publish the code on pp. 15-20 to 15-22 of the first (single volume) version of the manual. Later (current) versions of Benchmark Factory have two manuals; the code is on pp. 11-18 to 11-20 of the Benchmark Factory Reference Guide.
Tested databases should be configured to checkpoint often enough so that recovery time is kept to 1 minute or less.
The first run of each test will be a 100 user workload run of the given test mix just to warm up the cache. Results for this run will be discarded.
When the actual test run starts, we will verify the tested database has reached a steady state before the measurement interval begins by inspecting the real-time performance curves returned during the benchmark.
The database will be left running (with cache un-flushed) between multiple user load repetitions of the same test.
We will start and stop the database between each test.
A commit (synchpoint) must be issued after every update transaction.
All the write transactions must be coded so non-repeatable reads are not possible. Vendors may ensure this by setting the transaction isolation level for these transactions to Repeatable Read or Serializable, or hint the transactions so the appropriate selects take update locks. All other transactions must (at least) be run at ANSI Isolation Level 2, Read Committed.
We will run a single iteration of all tests containing queries in single user mode and capture returned results to disk. This output file will be analyzed to verify correct results are returned for all queries. Times for this run will not be reported.
We will run check scripts to verify consistency and atomicity in the OLTP Read/Write Mix Test.
Any form of query pre-computation or post-execution results caching (through summary tables, materialized views, etc.) is not allowed. Queries must be actually calculated only when they are submitted.
OLTP transactions can be implemented as ad hoc dynamic SQL, stored procedures or client-side C/C++ procedures.
When run in the OLTP mixes, the DSS queries can be implemented as ad hoc dynamic SQL, stored procedures or client-side C/C++ procedures. When run on their own (as part of the Dedicated Reporting Test and Online Reporting Test), they must be submitted as ad hoc dynamic SQL statements.
For all our tests, vendors will not be allowed to rewrite queries for improved performance—we want to see the database optimizer’s ability to rewrite queries as well as pick optimal joins and indices. In particular, we do not allow optimizer or cache hinting, either in submitted SQL or in database catalogs.
The only exception to this rule is in the Availability tests, where vendors can make query or catalog changes to give preference to the OLTP queries. Otherwise, vendors may make syntactical changes to the queries only if they will not run as written. Such changes must be as minimal as possible to preserve comparability between tested systems. We will use the query rewrite guidelines in Chapter 2 of the TPC-D specification as our guide to what allowable changes are. Clear any potential change with Timothy Dyck before submitting code for final review.
In all cases, all transaction output must be returned to the client (no discarding rows on the server).
We will be capturing the following metrics:
¨ Transactions per second (higher figures indicate better performance)
¨ Average, minimum, maximum and 90th percentile times for both first row and last row return times (lower figures indicate better performance)
A test will be considered finished (optimal) when we have run it enough times to produce at least three test results that are within 5% of each other. This is defined to occur when the peak throughput of a given run has the peak throughputs of two other runs within +/- 2.5% of its value. We will repeatedly run each test until this happens.
The final, “official” result for the test will be the average of the results of these three individual test results.
If vendors have packaged their database systems into multiple configurations aimed at different market segments, contact PC Week Labs to reach a mutual agreement on which package is most representative for our mid-size organization workload.
Deliver the product without any extra-cost options or add-ons (other than the required number of user licenses). Products must be delivered in a shrink-wrapped box, and must be currently available, purchasable shipping code.
All tests must be performed using only the standard tools and/or APIs each vendor provides with their database products. In particular, we will not be testing with a transaction monitor. The database should handle client connection management and transaction queuing itself.